智能论文笔记

Adaptive Contrast for Image Regression in Computer-Aided Disease Assessment

Weihang Dai , Xiaomeng Li , Wan Hang Keith Chiu , Michael D. Kuo , Kwang-Ting Cheng

分类：计算机视觉

2021-12-22

图像回归任务，如骨矿物密度（BMD）估计和左心室喷射分数（LVEF）预测，在计算机辅助疾病评估中起重要作用。大多数深度回归方法用单一的回归损耗函数训练神经网络，如MSE或L1损耗。在本文中，我们提出了一种用于深度图像回归的第一个对比学习框架，即adacon，其包括通过新颖的自适应边缘对比损耗和回归预测分支的特征学习分支组成。我们的方法包含标签距离关系作为学习特征表示的一部分，这允许在下游回归任务中进行更好的性能。此外，它可以用作即插即用模块，以提高现有回归方法的性能。我们展示了adacon对来自X射线图像的骨矿物密度估计和来自超声心动图象的X射线图像和左心室喷射分数预测的骨矿物密度估计的有效性。 Adacon分别导致MAE在最先进的BMD估计和LVEF预测方法中相对提高3.3％和5.9％。

translated by 谷歌翻译

Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation

Sixing Yan , William K. Cheung , Keith Chiu , Terence M. Tong , Charles K. Cheung

分类：计算机视觉 | 自然语言处理

2022-07-04

从X射线图像中自动生成医疗报告可以帮助放射科医生执行耗时但重要的报告任务。然而，实现临床准确的生成报告仍然具有挑战性。发现使用知识图方法对潜在异常进行建模有望在提高临床准确性方面。在本文中，我们介绍了一种新型的罚款颗粒知识图结构，称为属性异常图（ATAG）。 ATAG由互连的异常节点和属性节点组成，使其可以更好地捕获异常细节。与手动构建异常图的现有方法相反，我们提出了一种方法，以根据注释，X射线数据集中的医疗报告和Radlex放射线词典自动构建细粒度的图形结构。然后，我们将使用深层模型与用编码器架构结构进行报告的ATAG嵌入。特别是，探索了图表网络以编码异常及其属性之间的关系。采用门控机制并将其与各种解码器整合在一起。我们根据基准数据集进行了广泛的实验，并表明基于ATAG的深层模型优于SOTA方法，并可以提高生成报告的临床准确性。

translated by 谷歌翻译

Robust and Precise Facial Landmark Detection by Self-Calibrated Pose Attention Network

Jun Wan , Hui Xi , Jie Zhou , Zhihui Lai , Witold Pedrycz , Xu Wang , Hang Sun

分类：计算机视觉

2021-12-23

目前全面监督的面部地标检测方法迅速进行，实现了显着性能。然而，当在大型姿势和重闭合的面孔和重闭合时仍然遭受痛苦，以进行不准确的面部形状约束，并且标记的训练样本不足。在本文中，我们提出了一个半监督框架，即自我校准的姿势注意网络（SCPAN），以实现更具挑战性的情景中的更强大和精确的面部地标检测。具体地，建议通过定影边界和地标强度场信息来模拟更有效的面部形状约束的边界意识的地标强度（BALI）字段。此外，设计了一种自我校准的姿势注意力（SCPA）模型，用于提供自学习的目标函数，该功能通过引入自校准机制和姿势注意掩模而无需标签信息而无需标签信息。我们认为，通过将巴厘岛领域和SCPA模型集成到新颖的自我校准的姿势网络中，可以了解更多的面部现有知识，并且我们的面孔方法的检测精度和稳健性得到了改善。获得具有挑战性的基准数据集获得的实验结果表明，我们的方法优于文献中最先进的方法。

translated by 谷歌翻译

Semi-Structured Object Sequence Encoders

Rudra Murthy V , Riyaz Bhat , Chulaka Gunasekara , Hui Wan , Tejas Indulal Dhamecha , Danish Contractor , Marina Danilevsky

分类：计算机视觉 | 人工智能 | 自然语言处理

2023-01-03

In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

Jiahao Zhu , Daizong Liu , Pan Zhou , Xing Di , Yu Cheng , Song Yang , Wenzheng Xu , Zichuan Xu , Yao Wan , Lichao Sun

分类：计算机视觉

2023-01-02

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.

translated by 谷歌翻译

Integrating Semantic Information into Sketchy Reading Module of Retro-Reader for Vietnamese Machine Reading Comprehension

Hang Thi-Thu Le , Viet-Duc Ho , Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.

translated by 谷歌翻译

Deep Hierarchy Quantization Compression algorithm based on Dynamic Sampling

Wan Jiang , Gang Liu , Xiaofeng Chen , Yipeng Zhou

分类：机器学习

2022-12-30

Unlike traditional distributed machine learning, federated learning stores data locally for training and then aggregates the models on the server, which solves the data security problem that may arise in traditional distributed machine learning. However, during the training process, the transmission of model parameters can impose a significant load on the network bandwidth. It has been pointed out that the vast majority of model parameters are redundant during model parameter transmission. In this paper, we explore the data distribution law of selected partial model parameters on this basis, and propose a deep hierarchical quantization compression algorithm, which further compresses the model and reduces the network load brought by data transmission through the hierarchical quantization of model parameters. And we adopt a dynamic sampling strategy for the selection of clients to accelerate the convergence of the model. Experimental results on different public datasets demonstrate the effectiveness of our algorithm.

translated by 谷歌翻译

Heterogeneous Synthetic Learner for Panel Data

Ye Shen , Runzhe Wan , Hengrui Cai , Rui Song

分类： (统计)机器学习 | 机器学习

2022-12-30

In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.

translated by 谷歌翻译

Pensieve 5G: Implementation of RL-based ABR Algorithm for UHD 4K/8K Content Delivery on Commercial 5G SA/NR-DC Network

Kasidis Arunruangsirilert , Bo Wei , Hang Song , Jiro Katto

分类：机器学习

2022-12-29

While the rollout of the fifth-generation mobile network (5G) is underway across the globe with the intention to deliver 4K/8K UHD videos, Augmented Reality (AR), and Virtual Reality (VR) content to the mass amounts of users, the coverage and throughput are still one of the most significant issues, especially in the rural areas, where only 5G in the low-frequency band are being deployed. This called for a high-performance adaptive bitrate (ABR) algorithm that can maximize the user quality of experience given 5G network characteristics and data rate of UHD contents. Recently, many of the newly proposed ABR techniques were machine-learning based. Among that, Pensieve is one of the state-of-the-art techniques, which utilized reinforcement-learning to generate an ABR algorithm based on observation of past decision performance. By incorporating the context of the 5G network and UHD content, Pensieve has been optimized into Pensieve 5G. New QoE metrics that more accurately represent the QoE of UHD video streaming on the different types of devices were proposed and used to evaluate Pensieve 5G against other ABR techniques including the original Pensieve. The results from the simulation based on the real 5G Standalone (SA) network throughput shows that Pensieve 5G outperforms both conventional algorithms and Pensieve with the average QoE improvement of 8.8% and 14.2%, respectively. Additionally, Pensieve 5G also performed well on the commercial 5G NR-NR Dual Connectivity (NR-DC) Network, despite the training being done solely using the data from the 5G Standalone (SA) network.

translated by 谷歌翻译